AITopics | max null 0

In Section A.1 and A.2 we derive closed-form expressions of the standard and robust risks from We first prove Equation (13). We now prove the second part of the statement. In this section we provide additional details on our experiments. B.1 Neural networks on sanitized binary MNIST If not mentioned otherwise, we use noiseless i.i.d. C.1 we give an intuitive explantion for the robust overfitting phenomenon described in C.2 we discuss how inconsistent adversarial training prevents We now shed light on the phenomena revealed by Theorem 3.1 and Figure 2. In particular, we In this section we further discuss robust logistic regression studied in Section 4. As observed in Section 4.4, label noise can prevent interpolation and hence improve the robust risk Hence, inconsistent training perturbations can induce spurious regularization effects.

equation, optimization problem, robust risk, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.35)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Add feedback

A Closed form expressions for the robust risks

Neural Information Processing SystemsAug-17-2025, 06:59:21 GMT

In Section A.1 and A.2 we derive closed-form expressions of the standard and robust risks from We first prove Equation (13). We now prove the second part of the statement. In this section we provide additional details on our experiments. B.1 Neural networks on sanitized binary MNIST If not mentioned otherwise, we use noiseless i.i.d. C.1 we give an intuitive explantion for the robust overfitting phenomenon described in C.2 we discuss how inconsistent adversarial training prevents We now shed light on the phenomena revealed by Theorem 3.1 and Figure 2. In particular, we In this section we further discuss robust logistic regression studied in Section 4. As observed in Section 4.4, label noise can prevent interpolation and hence improve the robust risk Hence, inconsistent training perturbations can induce spurious regularization effects.

artificial intelligence, equation, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.35)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Add feedback

Generalized Kernelized Bandits: Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds

Metelli, Alberto Maria, Drago, Simone, Mussi, Marco

arXiv.org Machine LearningAug-5-2025

We study the regret minimization problem in the novel setting of generalized kernelized bandits (GKBs), where we optimize an unknown function $f^*$ belonging to a reproducing kernel Hilbert space (RKHS) having access to samples generated by an exponential family (EF) noise model whose mean is a non-linear function $μ(f^*)$. This model extends both kernelized bandits (KBs) and generalized linear bandits (GLBs). We propose an optimistic algorithm, GKB-UCB, and we explain why existing self-normalized concentration inequalities do not allow to provide tight regret guarantees. For this reason, we devise a novel self-normalized Bernstein-like dimension-free inequality resorting to Freedman's inequality and a stitching argument, which represents a contribution of independent interest. Based on it, we conduct a regret analysis of GKB-UCB, deriving a regret bound of order $\widetilde{O}( γ_T \sqrt{T/κ_*})$, being $T$ the learning horizon, $γ_T$ the maximal information gain, and $κ_*$ a term characterizing the magnitude the reward nonlinearity. Our result matches, up to multiplicative constants and logarithmic terms, the state-of-the-art bounds for both KBs and GLBs and provides a unified view of both settings.

artificial intelligence, inequality, machine learning, (18 more...)

arXiv.org Machine Learning

2508.01681

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Semi-gradient DICE for Offline Constrained Reinforcement Learning

Kim, Woosung, Seo, JunHo, Lee, Jongmin, Lee, Byung-Jun

arXiv.org Artificial IntelligenceJun-11-2025

Stationary Distribution Correction Estimation (DICE) addresses the mismatch between the stationary distribution induced by a policy and the target distribution required for reliable off-policy evaluation (OPE) and policy optimization. DICE-based offline constrained RL particularly benefits from the flexibility of DICE, as it simultaneously maximizes return while estimating costs in offline settings. However, we have observed that recent approaches designed to enhance the offline RL performance of the DICE framework inadvertently undermine its ability to perform OPE, making them unsuitable for constrained RL scenarios. In this paper, we identify the root cause of this limitation: their reliance on a semi-gradient optimization, which solves a fundamentally different optimization problem and results in failures in cost estimation. Building on these insights, we propose a novel method to enable OPE and constrained RL through semi-gradient DICE. Our method ensures accurate cost estimation and achieves state-of-the-art performance on the offline constrained RL benchmark, DSRL.

machine learning, reinforcement learning, semidice, (16 more...)

arXiv.org Artificial Intelligence

2506.08644

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features

Veiga, Rodrigo, Remizova, Anastasia, Macris, Nicolas

arXiv.org Artificial IntelligenceFeb-12-2024

In supervised learning of neural networks and regression models, understanding the dynamics of optimization algorithms, and in particular stochastic gradient descent (SGD), is of utmost importance. However, despite much progress in a number of directions, this still remains a highly challenging theoretical problem. A fruitful approach that allows making analytical progress consists of suitably approximating SGD by a continuous time approximation, henceforth called stochastic gradient flow (SGF). In this contribution, we build up on this approach, to develop a general formalism characterizing the dynamics of the stochastic process, and apply it to the investigation of the test risk (or generalization error) as a function of time. As is well known, the classical bias-variance trade-off has been challenged in a number of models displaying the double descent phenomenon [1, 2, 3]. Analytical derivations of double descent curves have been achieved for relatively simple models, but are limited to the use of least squares estimators (no dynamics) and pure gradient flow (GF) approximations of gradient descent (GD). The present work goes one step further by investigating the effects of stochasticity on the double descent curve. Our main contributions are summarized as follows: C1 We consider a general Itô stochastic differential equation (SDE) and represent the Markovian transition probability as a path integral, Eq. (12). A general'explicit' formula for the transition probability, Eq. (18), is derived in the limit of a small learning rate by using a Laplace approximation.

artificial intelligence, machine learning, ode, (17 more...)

arXiv.org Artificial Intelligence

2402.07626

Country:

North America > United States > New York (0.14)
Europe > Switzerland (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Filters

Collaborating Authors

max null 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

e58cc5ca94270acaceed13bc82dfedf7-Supplemental.pdf

7f141cf8e7136ce8701dc6636c2a6fe4-Supplemental.pdf

3fbcfbc2b4009ae8dfa17a562532d123-Supplemental-Conference.pdf

349956dee974cfdcbbb2d06afad5dd4a-Supplemental-Conference.pdf

0afe095e81a6ac76ff3f69975cb3e7ae-Supplemental.pdf

A Closed form expressions for the robust risks

A Closed form expressions for the robust risks

Generalized Kernelized Bandits: Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds

Semi-gradient DICE for Offline Constrained Reinforcement Learning

Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features